Skip to content

Conversation

@AlannaBurke
Copy link
Contributor

I split the docs work into 2 PRs, this is the second.

It builds on #448, so let's merge that first.

AlannaBurke and others added 20 commits October 8, 2025 16:58
…rehensive getting started with dependency details, and expanded concepts page with Monarch, Services, TorchStore, and RL workflows
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
Co-authored-by: Svetlana Karslioglu <[email protected]>
- Enhanced homepage with Monarch foundation emphasis, technology stack highlights, validated examples, and clear navigation paths
- Expanded getting started with detailed dependency explanations (Monarch, vLLM, TorchTitan, TorchStore, PyTorch Nightly)
- Converted installation and verification steps to numbered lists for better readability
- Removed FAQ references as FAQ page has been removed
- Fixed GPU/process terminology in code examples
- Split long concepts page into: Concepts (overview), Architecture (Monarch/Services/TorchStore), Technology Stack (dependencies), and RL Workflows (writing algorithms)
- Added nested toctree under concepts for better navigation in sidebar
- Each page focuses on a single aspect with clear cross-references
- Improved readability and maintainability of documentation
- Fixed GPU/process terminology throughout architecture and workflow examples
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 17, 2025
@svekars
Copy link
Contributor

svekars commented Oct 17, 2025

**Ephemeral Infrastructure**
: Services are created with your job and torn down when finished. Want to try a new reward model? Change your Python code. No standing deployments to maintain, no infrastructure to provision ahead of time.

## TorchStore: Distributed Weight Storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks good to me, but cc @LucasLLC in case of any desired big changes

@@ -0,0 +1,120 @@
# Technology Stack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@allenwang28 allenwang28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It mostly looks good, @AlannaBurke pls ping me once the changes are incorporated and this PR is only focused on the concepts.md!

- **Reservoir**: Uniform sampling from history
- **Hybrid**: Mix multiple strategies

Integrates with TorchStore for efficient distributed storage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this line, this is not true right now unfortunately


You choose your fault tolerance granularity based on your needs.

## Best Practices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the best practices section for now, as we haven't consolidated on them ourselves yet

- Validate data pipelines before full training
- Monitor loss curves and generation quality

## Validation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the validation section here as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants